3 research outputs found
New Methods For Domain Adaptation And Low Data Deep Learning
Real-world data coming from settings like hospital collections for detecting disease experience multiple sources of distributional shifts. These issues affect the performance of diagnostic methods, reducing the quality of service provided and leading to health or economic harm. Deep learning has emerged as a promising method for classification tasks, including diagnostics, and recent progress has led to methods that allow a neural network to adapt network statistics to shifts in specific settings at test time. However, problems arise in these methods adapting to general shifts and domains. In addition, they underperform when data is limited. In our first contribution, we tackle general domain shifts by investigating the key issues leading Test Time Adaptive algorithms to fail under label shift, proposing a means for mitigating these failures. In the second contribution, we tackle few-shot cross-domain adaptation by modifying the affine parameters of the batch norm during few-shot train time, generally enhancing performance. The third contribution parameterizes Scattering Networks, where we enhance a method for low data regimes by providing problem-specific adaptation
Channel selection for test-time adaptation under distribution shift
To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data
distribution during inference. Test-time batch normalization is a simple and popular
method that achieved compelling performance on domain shift benchmarks by
recalculating batch normalization statistics on test batches. However, in many
practical applications this technique is vulnerable to label distribution shifts. We
propose to tackle this challenge by only selectively adapting channels in a deep
network, minimizing drastic adaptation that is sensitive to label shifts. We find that
adapted models significantly improve the performance compared to the baseline
models and counteract unknown label shifts
Simulated Annealing in Early Layers Leads to Better Generalization
Recently, a number of iterative learning methods have been introduced to
improve generalization. These typically rely on training for longer periods of
time in exchange for improved generalization. LLF (later-layer-forgetting) is a
state-of-the-art method in this category. It strengthens learning in early
layers by periodically re-initializing the last few layers of the network. Our
principal innovation in this work is to use Simulated annealing in EArly Layers
(SEAL) of the network in place of re-initialization of later layers.
Essentially, later layers go through the normal gradient descent process, while
the early layers go through short stints of gradient ascent followed by
gradient descent. Extensive experiments on the popular Tiny-ImageNet dataset
benchmark and a series of transfer learning and few-shot learning tasks show
that we outperform LLF by a significant margin. We further show that, compared
to normal training, LLF features, although improving on the target task,
degrade the transfer learning performance across all datasets we explored. In
comparison, our method outperforms LLF across the same target datasets by a
large margin. We also show that the prediction depth of our method is
significantly lower than that of LLF and normal training, indicating on average
better prediction performance